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Abstract 


In this paper a new genetic algorithm called the Breeder Genetic Al- 
gorithm (BGA) is introduced. The BGA is based on artificial selection 
similar to that used by human breeders. A predictive model for the BGA is 
presented which is derived from quantitative genetics. The model is used to 
predict the behavior of the BGA for simple test functions. Different mutation 
schemes are compared by computing the expected progress to the solution. 
The numerical performance of the BGA is demonstrated on a test suite of 
multimodal functions. The number of function evaluations needed to locate 
the optimum scales only as n In(n) where n is the number of parameters. 
Results up to n = 1000 are reported. 


1 Introduction 


In (Mühlenbein, Gorges-Schleuter & Kramer 1988), (Mühlenbein, 1991), (Mühlen- 
bein, 1992b) the parallel genetic algorithm (PGA) was successfully applied to com- 
binatorical problems. Continuous parameter optimization has been described in 
(Mühlenbein, Schomisch & Born, 1991). We have now substantially improved the 
results obtained with the PGA. The improvements are the result of a new genetic 
algorithm. This algorithm we call the Distributed Breeder Genetic Algorithm 
(DBGA). The DBGA is inspired by breeding. Each one of a number of virtual 


breeders has the task of improving its own subpopulation. Occasionally a breeder 
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gets a new individual from its neighboring breeders. A DBGA with a single virtual 
breeder is called a Breeder Genetic Algorithm (BGA). 

Whereas the PGA models natural and self-organized evolution, the DBGA is 
based on rational selection performed by human breeders. In biological terminol- 
ogy: the PGA models natural selection and the DBGA models artificial selection. 
We expected artificial selection to be more efficient for optimization than natural se- 
lection. We will show in this paper that this is indeed the case. This result does not 
imply that the DBGA is “better” than the PGA. Both algorithms are important, 
both from a biological and a computer science point of view. The DBGA models ra- 
tional controlled evolution whereas the PGA models evolution which self-organizes. 

The major goal of this paper is to show that the BGA can profit, both in theory 
and practice, from the long experience accumulated by human breeders. In the last 
two hundred years, starting just before Darwin, breeding of animals has advanced 
from an art based on intuition to an empirical science based on statistics. The BGA 
has been strictly designed according to this science of breeding. But until recently, 
there has been a major difference between a real breeder and the virtual breeder 
of our genetic algorithm. The human breeder does not have information about the 
genetic material, he has to estimate aggregrate values which he calls the breeding 
value of an animal. The virtual breeder of our GA has knowledge about all the genes 
of his population. Furthermore he controls the genetic operators (i.e. mutation, 
recombination etc.). But with the introduction of biotechnology this distinction will 
probably soon vanish! 

The BGA is not radically new, it can be seen as a recombination between evolu- 
tion strategies ES (Schwefel, 1981), (Back, Hoffmeister & Schwefel, 1991) and genetic 
algorithms GA (Goldberg, 1989). The BGA uses truncation selection as performed 
by breeders. This selection scheme is similar to the (,A)-strategy in ES (Schwefel, 
1981). The search process of the BGA is mainly driven by recombination, making 
the BGA a genetic algorithm. Mutation is an important background operator for 
the BGA. The mutation rate is inversely proportional to the number of parameters 
to be optimized and the mutation range is fixed. 

The BGA is a random search method which can be applied to both discrete and 
continuous functions. In this paper the following questions will be answered 


e Given a mutation scheme, what is the expected progress of successful muta- 
tions for a single individual? 


e Given a selection and recombination schedule, what is the expected progress 
of the population? 


This approach is opposite to the standard GA analysis, which starts with the 
schema theorem. Mutation and recombination are only considered as disruptions. 
We see mutation and recombination as constructive operators. They are evaluated 
according to the probability that they create better solutions. 

The outline of the paper is as follows. In section 2 we formally describe the 
DBGA and the BGA. Mutation is analyzed in section 3. The approach of quan- 


titative genetics to artificial selection is explained in section 4. The framework is 
used to investigate selection and recombination. The selection equation is used in 
the following section to analyze proportionate selection. It is shown in section 5 
that proportionate selection is not a good scheme for optimization. The empirical 
laws derived in section 4 are investigated in section 6 for continuous parameter op- 
timization. The interaction of recombination and mutation is analyzed in section 7. 
Numerical results for dimensions up to 1000 are reported in section 8. 

We strongly believe that a good theory has to be proven with challenging appli- 
cations. Therefore we only describe that part of the theory which is necessary for 
understanding the rationale of the BGA. The theory is explained in more detail in 
(Mühlenbein & Schlierkamp-Voosen, 1992a). This paper concentrates on the BGA. 
The DBGA is investigated in (Mühlenbein & Schlierkamp-Voosen, 1992b). 


2 A formal description of the DBGA 


The description is analogous to that in (Mühlenbein, Schomisch & Born, 1991) where 
the PGA is described. The DBGA is an eight tuple 


DBGA = (P°, sub, N, sg,7, BGA, F, term) (1) 

where 

P®  := initial population 

sub := number of subgroups 

N := number of individuals per subgroup 

sq := number of neighboring subgroups 

T := migration schedule 

BGA := Breeder Genetic Algorithm 

F := fitness function 

term := termination criterion 


Each subgroup is controlled by a BGA. A BGA can be described by 


BGA = (P?,N,T,T,A, HC, F,term) (2) 


P? is the initial subpopulation, N the size of the population, T the truncation 
threshold, T the recombination operator, A the mutation operator and term the 
termination criterion. HC is a hill-climbing method. 

All numerical results in this paper have been obtained without local hill-climbing. 
We will show in the next section that the BGA mutation operator works almost as 
well as more specialized local hill-climbing methods which do not use derivatives 
of the function to be optimized. Therefore for continuous parameter optimization 
local hill-climbing is not as important as for discrete optimization problems. The 
importance of hill-climbing in discrete domains has been shown in (Mühlenbein, 


1991), (Mühlenbein, 1992a). 


We will now describe each operator in more detail. Their respective influence 
on the performance of the BGA will be investigated in the following sections. For a 
BGA run a set of genetic operators is defined. Then these operators are applied to 
the individuals of the population. There are no probabilities for the application of 
the operators. 


2.1 Selection 


The BGA uses a selection scheme called truncation selection. The T% best individ- 
uals are selected and mated randomly until the number of offspring is equal to the 
size of the population. The offspring generation replaces the parent population. The 
best individual found so far will remain in the population. Self-mating is prohibited. 


2.2 Recombination 


In (Mühlenbein, Schomisch & Born, 1991) we distinguished between recombination 
and crossing-over. The mixing of the variables was called recombination and the 
mixing of the values of a variable was named crossing-over. From a mathematical 
viewpoint this distinction is unnecessary. Any operator which combines the genetic 
material of two parents we will call a recombination operator in this paper. Recom- 
bination is the most important search operator of the BGA. We have implemented 
three different operators. 


e Discrete recombination 
Let x = (#1,...,%,) and y = (Yı,...,Yn) be the parent strings. Then the 
offspring z = (21,...,2n) is computed by 
zi = {ui} or {yi} (3) 
x; or y; are chosen with probability 0.5. 
e Extended intermediate recombination 
zi = ti + aly — ti) t=1,...,n (4) 
a; is chosen uniform randomly in the interval [—0.25, 1.25] 
e Extended line recombination 
zi = ti + aly — zi) t=1,...,n (5) 
a is chosen uniform randomly in [—0.25, 1.25]. 
The geometric effect of recombination has been explained in (Mühlenbein, Schomisch 


& Born, 1991). Discrete recombination generates corners of the hypercube defined 
by the components of x and y. Extended intermediate recombination can generate 
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any point within a slightly larger hypercube. Extended line recombination generates 
a point on the line defined by x and y. In both operators the new point may lie 
outside [z;, y;]. 

The rational of these operators are geometrically obvious. Discrete recombina- 
tion and intermediate recombination with a = 0.5 have been used successfully in 
genetic algorithms and evolution strategies. They have been already proposed by 
Bremermann (1966). 

Each recombination operator has been designed to solve specific problems which 
have been encountered. In the course of applying the BGA to more problems we 
will extend the three basic operators in the future. The next recombination operator 
will be based on three-tuple mating. Here three points (parents) will be recombined. 
General m-tuple mating has already been suggested by Bremermann (1966). 


2.3 Mutation 


A variable x; is selected with probability pm for mutation. The BGA normally 
uses pm = 1/n. At least one variable will be mutated. A value out of an interval 
[—range;,range;| is added to the selected variable. range; defines the mutation 
range. It is normally set to 0.1 - searchinterval;. searchinterval; is the domain of 
definition of variable z;. 

The new value z; is computed according to 


zi = t; range; ó (6) 


The + or — sign is chosen with probability 0.5. 6 is computed from a distribution 
which prefers small values. This is realized as follows 


15 
= Var" a; €0,1 
i=0 


Before mutation we set a; = 0. Then each a; is mutated to 1 with probability 
ps = 1/16. Only a; = 1 contributes to the sum. On the average there will be just 
one a; with value 1, say a;. Then ö is given by 


6 = 27 


The mutation operator is similar in spirit to that used by the PGA (Mihlenbein, 
Schomisch & Born, 1991), but the BGA operator is much more easy to understand. 
Furthermore, it is independent of the location in phenotype space. 

The standard BGA mutation operator is able to generate any point in the hy- 
percube with center x defined by x; + range;. But it tests much more often in the 
neigborhood of x. In the above standard setting, the mutation operator is able to 


locate the optimal x; up to a precision of range;-2~'°. The rationale of this mutation 
operator will be explained in the next section. 


3 Mutation in continuous domains 


The mutation operator has been investigated for binary domains in (Mihlenbein, 
1991), (Mühlenbein, 1992a). It was shown that the mutation rate should be inversely 
proportional to the number of bits in the chromosome. In this section we will show 
that an adaptation of the above strategy, the BGA mutation scheme, also works 
well in continuous domains. 

Our analysis is similar to the investigation of mutation in evolution strategies 
(Back, Hoffmeister & Schwefel, 1991), (Schwefel, 1981), (Rechenberg, 1973) and in 
random search methods (Torn & Zilinskas, 1989), (Solis & Wets, 1981). 

We will compare different mutation schemes according to the performance mea- 
sure expected progress E(r). Given an arbitrary point x with distance r to the 
optimum then E(r) is defined as the expected improvement of x by successful mu- 
tations in euclidian distance. Mutations giving no improvement are not counted. 
E(r) is defined by probability theory as an integral over the domain of successful 
mutations. The integrand is given by progress(y) * probability(y). The domain of 
successful mutation depends on the fitness function. We will assume for the analysis 
a unimodal function with spherical symmetry. 

First we will compute the expected progress in one dimension for different mu- 
tation schemes - uniform distributed mutation, normal distributed mutation and 
the BGA mutation scheme. We will show that the expected progress of the simple 
BGA scheme is only six times worse than normal distributed mutation with optimal 
adaptation. 


Uniform distributed mutation 


The mutation operator randomly chooses a number z in the interval defined by 
[—A, A]. A is called the mutation range. The new point is given by 


Im = LF2Z 


Let ||x|| = r. For convenience let the optimum be in 0. Then for A > 2r the 
expected progress of successful mutations is given by 
r g r? 
E(r) =2- f I de = — 7 
0) 0 2A 2A (7) 
The optimum progress is obtained for A = 2r. Similarly we have for A <r 
Ag A 
E(r) = —dı = — 8 
=f, sat = 7 (3) 


The optimum progress is obtained for A = r. The formulas for the case r < A < 2r 
are slightly more difficult and omitted here. This proves the next theorem. 


Theorem 1 For uniform distributed mutation the optimal mutation range is given 
by A=2r or A=r. In both cases the normalized expected progress is given by 


E(r) 1 
r I ©) 


Remark: If the mutation range is held constant, then the normalized expected 
progress goes to zero for r — 0. 
We will now investigate a distribution which more often chooses points in the 


neighborhood of r. 


Normal distributed mutation 


Normal distributed mutation is used in evolution strategies (Back, Hoffmeister, & 
Schwefel, 1991). The mutation operator chooses a random number z from a normal 
distribution N(0, g) with standard deviation ø. 

z will be a successful mutation if —2r < z < 0. The expected progress for these 
mutations can be computed as usual 


270 \J-r 


1 0 1? 
E(r,o) (/ —X exp IF gr +f (2r + «) exp — 35 sue) (10) 


After tedious, but straightforward integration one obtains 


a) + ie I exp- => dr (11) 


The standard deviation giving the optimal expected progress can be obtained from 
= = 0. We will not compute it here, because for high dimensions (n > 0) Rechen- 
berg(1973) was able to compute it approximately. We will use Rechenberg’s result 
later when we investigate n dimensions. From the above equation we obtain by 


o 
E(r,o) = —— Te (1-25 553 + exp — 


Taylor series expansion 


Theorem 2 For fixed o and r — 0 the expected progress is approximate 


E(r,o) & (12) 
270 
Foro =r one obtains the normalized progress 
E 
Eine) ~ 0.1 (13) 
r 


Remark: If o is held fixed then the normalized expected progress goes to zero for 
r — 0. A constant normalized expected progress is obtained if o is proportional 
tor. 

Both, uniform and normal distributed mutation need an adaptation of the mu- 
tation range in order to give a reasonable progress for small r. We will now show 
that the BGA mutation scheme does not need such an adaptation. 


The BGA mutation scheme 


The BGA mutation operator randomly chooses with probability 1/32 one of the 32 
points 


I (PA, 274A, sera 2° A) (14) 
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A defines the mutation range. The BGA mutation scheme can approximate the 
optimum only up to a precision of 2715 A 47!?, 


Theorem 3 The normalized expected progress for the BGA mutation scheme is 


bounded by 
OLOPE 
327 r 716 


(15) 
Proof: For simplicity, let for some 0 < k < 15 
A= ry 


Successful mutations have to be in the range —2r < z < 0. Then the expected 
progress is given by 


1 
(140 27t) (16) 


E(r) = 30" ( 


The normalized expected progress for the BGA mutation scheme is of the same 
order as the optimal schemes for uniform or normal distributed mutation. But the 
optimal schemes need the exact distance r to the optimum. In a real application the 
distance is not known. Therefore the mutation range has to be empirically adapted 
(Bäck, Hoffmeister & Schwefel, 1991). 

The BGA mutation is robust. It does not need an adaptation of the mutation 
range because it generates points more often in the neighborhood of the given point. 
Nevertheless the BGA mutation is only about three times less efficient than normal 
distributed mutation with o =r. 

We will now investigate mutation in high dimensions. We will start with the 
BGA mutation because it is the simplest to analyze. 


The BGA mutation in n dimensions 


A variable is chosen with probability p,, = 1/n for mutation. Therefore on average 
just one variable will be mutated. 


Theorem 4 Given a point with distance r < A to the optimum, then the expected 
progress in n dimensions is bounded by 
1 E(n,r) 1 


— < < — 1 
32n 7 r — Jn (17) 


Proof Let an arbitrary point with distance r be given. By rotation we move the 
point to the first axis. This will give the point (r,0,...,0). The rotation does not 
change the average progress because we asumed that the fitness function is spherical 
symmetric. Therefore 


4712]f a higher precision is needed, the mutation range will be reduced during the run. We will 
discuss this modification at the end of the section. 


E(n,r) = = Ba (r) 


E, (r) is the average progress in the direction of zı. The result now follows from 
theorem 3. I 

The normalized expected progress decreases with 1/n. We will now consider uniform 
distributed mutation. 

Uniform distributed mutation in n dimensions 


The mutation operator chooses n-times a number from the interval defined by 
[—A, A]. Let a point with distance r to the optimum be given. 


Theorem 5 The expected progress for uniform mutation is given for A > 2r by 


E(n,r) — oT m (18) 
r 2n(n +1) A” 
The optimal expected progress is obtained for A = 2r 
E 
(n, r) — m . 2-7 (19) 
r 2n(n + 1) 


Proof: We start with n = 2. Let ||x|| = ri. The expected progress can be obtained 
from the integral 


1 Ty 20 
E(2,rı) = | f (rı — r)rdödr 


Therefore 


For arbitrary n the expected progress is given by 


E(n,rı) = oir I fon —r)dh,(r)dr 


The inner integral has to be done in polar coordinates over the surface of the n- 
dimensional hypersphere of radius r. We computed 


n+l 
T ry 


u 2n(n +1) A” 


From this equation the theorem immediately follows. Ii 


E(n,rı) (20) 


The optimal normalized average progress for uniform distributed mutation decreases 
exponential with n. The reason for this behavior is the well known fact that the 
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volume of the unit sphere in n-dimensions goes to zero for n — oo. Better results 
can be obtained if the uniform distribution is restricted to a hypersphere with radius 
rp. For ra = 1.225r/,/n the expected progress was computed for large n and small 
r in (Schumer & Steiglitz, 1968) 


- (21) 


We now turn to normal distributed mutation. 


Normal distributed mutation in n dimensions 


The mutation operator chooses n-times a number randomly from a normal distri- 
bution N(0,o) with standard deviation ø. Let ||e]| = ra. 
The expected progress is given by 


1 0 r? rl r? 
E(ri, 0) = roy (E r exp zga dinar + l.. fen +r)exp rato) 
(22) 
The inner integral has to be done in polar coordinates over half of the surface of 
the n-dimensional hypersphere of radius r. This integral does not appear to be 
intractable. Nevertheless, we did not find a closed solution. However Rechenberg 
was able to compute the expected progress approximately for n > 0. We just cite 


the result (Rechenberg, 1973). 


Theorem 6 (Rechenberg): The optimal standard deviation for normal distributed 
mutation is given by 


r 
Oopt = 1.2 n 


The expected progress is approximately 


E(n,r, 0) _ 0.2 (23) 
r 
We now summarize the results of the above analysis. The results have been proved 
for unimodal fitness functions with rotational symmetry only. 

The expected progress for the simple BGA scheme is 4 — 6 times less than the 
expected progress for the normal distributed mutation with optimal a. It scales in 
n dimensions like 1/n which is believed to be the optimal progress rate for search 
methods which do not use derivatives of the function. We may thus propose the 
following rule for high dimensions. 


Rule 1 Do not change too many variables at the same time in high dimensions 
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The above rule was already found by Bremermann (1966). For problems of the kind 
llel = ||b- Mz|| minimal 


he suggested uniform mutation with a mutation range of 


let 
n 

This is a very simple and constructive adaptation scheme. But unfortunately an 
analysis immediately shows that the normalized expected progress gets worse with 
increasing || M ||. 

The formulas of this section can be used to estimate the number of trials to 
approximate the optimum with a given precision. For simplicity, let us assume a 
population of size 1. In this case the BGA accepts a mutated string only if it is not 
worse than the parent string. 


Theorem 7 For simplicity let the optimum be at x = 0. Let the expected progress 
be of the form E(||x||) = ellz||/n. Then the number of iterations IT needed to 
approximate the optimum up to a precision of € is given forn > 0 by 


IT = "mel (24) 


Cc € 


Proof: Let xo be the initial string, ro its euclidian norm. In a very crude statistical 
approximation we have 


€ 
ll = (ill == 
lill = [ea] >) 


Therefore we get 
Cai 
Izill = [lol]. ==) (25) 


The number of iterations IT can be computed from the equation 


tn = in Helly 


n zoll 
by Taylor series expension and ||x;r|=e. E 


Remark: The number of iterations linearly increases with n, if the initial string 
has the same euclidian distance in all dimensions. 

The above theorem was derived without statistical rigour. Therefore we investi- 
gated the mutation scheme with simulations. In table 1 numerical results are given 
for the initial string xo = (1,1,...,1). Note that the euclidian distance of this string 
is y/n. Therefore the number of iterations should approximately increase like nIn(n). 
The fitness function is the euclidean distance. 

The table 1 confirms the statistical analysis. The number of iterations increases 
like 32n - In(10°||xo||). We would like to mention that Solis and Wets(1981) report a 
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SD 
10 


0.1 2240 | 424 2578 
10 | 0.2 2197 | 433 
0.4 6090 | 1177 


10 

20 | 0. 701 5380 
20 | 0. 780 

20 | 0. 1390 
100 2101 29473 
100 1637 
100 5741 


Table 1: Number of iterations IT, termination € = 107? 


constant of 33 for their random search method. Their method dynamically adjusts 
the range of mutation. 

The number of iterations changes dramatically if the mutation rate grows. A 
very small change (in absolute terms) of the mutation rate from 1/100 to 4/100 has 
a huge impact. We have explained this behavior for discrete functions in (Mühlen- 
bein,1992a). 

The BGA mutation scheme can still be improved by two or more discrete adapta- 
tion steps. The adaptation works if not just a single point but a population of points 
are used. We outline the idea with an example. We restrict the BGA mutation range 
to say 9 points instead of 15 


(275A, ..., 2A) 


Then if all points of the population are within say a range of 277 - A, we change the 
mutation range to 


A =27.A 


This procedure is equivalent to dynamic parameter encoding techniques (Schrau- 
dolph & Belew, 1992). But note that the above procedure reduces the robustness 
of the mutation scheme. After a discrete adaptation step, points outside the new 
range cannot be generated by mutation anymore. 


4 The response to selection 


In this section we investigate the expected progress to the solution for the discrete 
recombination operator. Compared to mutation the following difficulty arises. Re- 
combination cannot be analyzed for a single point (individual), it needs at least 
two points. Furthermore recombination is not a random process, it depends on the 
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Figure 1: Regression of truncation selection 
Breeders often use truncation selection or mass selection as shown in figure 1. In 


truncation selection with threshold T, the T% best individuals will be selected as 
parents. T is normally chosen in the range 50% to 10%. 
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The prediction of the response to selection starts with 
Rt+1)=b- S(t) (28) 


The breeder either measures b; in previous generations or estimates b; by different 
methods (Crow, 1986). It is normally assumed that b; is constant for a certain 
number of generations. This leads to 


R(t +1) =b: S(t) (29) 


There is no genetics involved in this equation. It is simply an extrapolation from 
direct observation. The prediction of just one generation is only half the story. The 
breeder (and the GA user) would like to predict the cumulative response R,, for n 
generations of his breeding scheme. 


R, = R(t) (30) 


In order to compute R, a second equation is needed. In quantitative genetics, several 
approximate equations for S(t) are proposed (Bulmer, 1980), (Falconer, 1981). Un- 
fortunately these equations are not useful in the case of haploid organisms as used 
in our BGA. Therefore, we can only apply the research methods of quantitative 
genetics, not the results. 

Our approach has been influenced by Robertson, who wrote in (Robertson, 1970): 
“We may by conventional analysis discover that factor A and factor B have signifi- 
cant effects and that there is a significant interaction between them. It is however 
much more useful to find that an analysis in terms of, say, A/B accounts for almost 
all the variations due to both factors. In statistical terms, we are seeking the best 


‘re-parameterization’”. We will show that a re-parameterization is possible for the 


BGA. 
If the fitness values are normally distributed, the selection differential S(t) in 
truncation selection is approximately given by 


S=T1-o, (31) 


where o, is the standard deviation. / is called the selection intensity. The formula 
is a feature of the normal distribution. A derivation can be found in (Bulmer, 1980). 
In table 2 the relation between the truncation threshold 7 and the selection intensity 
I is shown. A decrease from 50% to 1% leads to an increase of the selection intensity 
from 0.8 to 2.66. 

If we insert (31) into (29) we obtain the well-known selection equation (Falconer, 


1981) 


R(t +1) =b-T-o,(t) (32) 


The science of artificial selection consists of estimating b and o,(t). The estimates 
depend on the fitness function. We will use as an introductory example the binary 
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0.34] 0.8 | 0.97 1.76 | 2.66 


Table 2: Selection intensity. 


ONEMAX function. Here the fitness is given by the number of 1’s in the binary 
string. 

In principle, the selection equation can be used for any populations. But the 
quality of the prediction depends on the size of the population and the genetic op- 
erators. The response to selection for the mutation operator is erratic for small 
populations. Therefore we have used in section 3 probability theory to predict the 
outcome of many trials. The behavior of recombination is much more predictable. 
For this reason we consider this case first. We assume uniform crossing-over for 
recombination (Syswerda, 1989). Uniform crossing-over is similar to discrete recom- 
bination in continuous domains. 

We will first estimate 6. A popular method for estimation is to make a regression 
of the midparent fitness value to the offspring. The midparent fitness value is defined 
as the average of the fitness of the two parents. A simple calculation shows that 
the probability of the offspring being better than the midparent is equal to the 
probability of them being worse. Therefore the average fitness of the offspring will 
be the same as the average of the midparents. The result means that the average 
of the offspring is the same as the average of the selected parents. This gives b = 1 
for ONEMAX. 

Estimating o,(t) is more difficult. We make the assumption that uniform crossing- 
over is a random process which creates a binomial fitness distribution with probabil- 
ity p(t). p(t) is the probability that there is a 1 at a locus. Therefore the standard 
deviation is given by 


op(t) = yn - p(t) - — p(t) (33) 
Noting that M(t) = n - p(t) we obtain from (26) and (32) the difference equation 
p(t-+ 1) = plt) = <=> y(t) =) (34) 


Jr 


The difference equation can be approximated by a differential equation 


dp(t) I 
do VR p(t): (1 — p(t) (35) 
The initial condition is p(0) = po. The solution of the differential equation is given 


by 
p(t) = 0.5 ( + sin( eI + arcsin(2po — D) (36) 
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14745.6 
14054.4 
14643.2 


Table 3: Averages over 50 runs for ONEMAX 


The convergence of the total population is characterized by p(GEN) = 1. GEN 
can be easily computed from the above equation It is given by 


GEN = (5 — arcsin(2po — 1)) - va (37) 


We have now arrived at a very interesting result. The number of generations 
needed until convergence is proportional to y/n and inversely proportional to 1/1. 

Because this result was obtained using a number of assumptions, we also in- 
vestigated the problem with extensive simulations. Some numerical results for the 
breeder genetic algorithm BGA with uniform crossing-over are given in table 3. Mu- 
tation is not used. 

We will try to capture the main results of table 3 in three empirical laws. 


Empirical Law 1 If the size of the population N is large enough that the popula- 
tion will converge to the optimum and the initial population is randomly generated 
(p(0) = 0.5), then the number of generations needed to converge to the optimum is 
given by 


GEN = kN! 0.8<I<16 kr2 (38) 


Note that GEN is only slightly larger than the analytically computed solution. 
GEN is independent of N for large N 4". It is a common belief by breeders 


4713The minimum N for which the population will converge seems to be proportional to In(n)/(n)- 


f(D) for I > 0.5. 
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T p os 05] 08] LO] a] IT) 21 


MinPop 90 64 50 48 54 65 90 | 200 


GEN 79 44 32 21 | 15.5 | 13.5 | 10.5 | 83 
FE 7110 | 2816 | 1600 | 1008 | 837 | 877 | 945 | 1660 


Table 4: Minimal population size MinPop, ONEMAX, n = 64 


(Falconer, 81) and GA researchers, that GEN depends mainly on N. But our 
simulations show that this is not the case, at least for the ONEMAX function. GEN 
only depends on the size of the problem n and the selection intensity /. Furthermore, 
increasing I by a factor of two halves the number of generations required. This 
result seems to suggest that a high selection intensity could be the best selection 
scheme. But this is of course not true. A high selection intensity leads to premature 
convergence and a bad quality of solution. 

In table 4 we investigate the minimal population size MinPop(I) for which the 
population will converge with given probability to the optimum. It is easily verified 
that MinPop has to increase for very large I. In this case just the best individual 
is selected. Then uniform crossing-over will only duplicate this individual. The 
population converges in one generation. Therefore the optimal solution has to be 
contained in the initial population. From standard probability theory we obtain 
MinPop > O(2"). 

The same argument shows that MinPop increases for very small selection in- 
tensities. Consequently there has to be a selection intensity for which Min Pop is 
minimal. In table 4 Min Pop is defined by the condition that 60% of the runs should 
terminate with the optimal solution. The smallest Min Pop is obtained for I = 0.8. 
The best efficiency in function evaluations FE is at I = 1. The efficiency increases 
only slightly between J = 1.0 and I = 1.6. Therefore we normally run the BGA 
with a selection intensity of I = 1.2. Note that these results have been derived for 
uniform crossing-over and the ONEMAX function only. 

The next empirical law can be derived from tables 3 and 4. 


Empirical Law 2 The number of generations until convergence is inversely pro- 
portional to I, if the same cumulative gain Raggy is required 


GEN «1/I 08<1< 1.6 (39) 
The third empirical law is a very crude estimate only. 


Empirical Law 3 The cumulative gain (i.e the total response to selection) is a 
monotonic function G of N/T for 0.8 <I < 1.7 


Ro G(N/D O08 <1< 1.7 (40) 
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From the third law we obtain as a rule of thumb 


Corollary 1 The minimal amount of computation in function evaluations FE = 
N -GEN to get the same total response is independent of the selection intensity. 


Proof 
N/T = GHR) 


If we assume GEN = k/I for some k the result follows. H 


We summarize the results. Uniform crossing-over and truncation selection can be 
analyzed with the methods developed in quantitative genetics. Thresholds in the 
range 50% < T < 10% give good results in terms of efficiency. 

Before we investigate truncation selection and recombination in the continuous 
domain, we will show that the above framework can also be applied to proportionate 
selection. 


5 Natural (proportionate) selection 
Proportionate selection is used by the simple genetic algorithm (Goldberg, 1989). 


Let 0 < p; < 1 be the proportion of genotype 2 in a population of size N, F; its 
fitness. Then the average fitnesss of the population is defined by 


M(t) = 2. Pit) Fi (41) 


In proportionate selection the phenotype distribution of the selected parents is given 


by 


pi(t)F; 
ig= 42 
Pis = MO (42) 
Theorem 8 In proportionate selection the selection differential is given by 
op (t) 
S(t)= — 43 
y= (13) 


Proof 


N P(E? — p(t) M?(1) 
= 2 M(t) 


We can compare truncation selection with proportionate selection by rewriting 
equation(43) 
op(t) 

S(t) = FAD ond (44) 
Equation (44) shows that the selection intensity in proportionate selection decreases 
with the inverse of the average fitness and proportionately to the standard deviation. 
The closer the population comes to the optimum, the less severe is the selection! 
Proportionate selection is afraid of reaching the goal. This result explains why pro- 
portionate selection is not a good strategy for optimization purposes (DeJong, 1992). 
Many application oriented genetic algorithms use modifications of the proportionate 
selection scheme. Our analysis has shown that these modifications are necessary, 
not tricks to speed up the algorithm. 

A recent overview and analysis of different selection schemes can be found in 
(Goldberg & Deb, 1991). Goldberg uses as performance measure the takeover time. 
Takeover is defined as the number of generations needed for convergence if the 
optimum is already contained in the population. We suggest analysis of the different 
selection schemes along the lines presented in this section. 


6 Discrete recombination 


In this section we will show that the major results of section 4 are also approxi- 
mately valid for continuous functions with a unimodal macro structure. We take as 
representative examples the unimodal function 


Fu) = Dei (45) 


and the multimodal Rastrigin function 


Folz) =n A+ (a? — A cos (2rz:)) — 5.12 < z; < 5.12,A =10.0 (46) 
1 


Fe has a large number of local minima, but they are unimodally distributed. On 
a large scale the structure of Fe is like a quadratic function. The best minima are 
centered around 0. 

The simulations have been done with discrete recombination and truncation 
selection. Discrete recombination for continuous functions is similar to uniform 
crossover in binary functions. In this case the BGA tries to solve a discrete problem 
with N alleles at each locus. Intermediate recombination as described in section 2 
is more probabilistical. It can generate new alleles which are not in the initial 
population. In this case the results of the discrete case cannot be applied. 
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256 | Fo | 1.6 | 13.9 | 1.6 | 3801 | 7.56 
256 | Fe | 1.6 | 13.8 | 1.0 | 3788 | 29.12 


192 | Fo | 1.2 | 18.2 | 2.8 | 3686 | 6.55 
128 | Fo | 0.8 | 27.3 | 3.9 | 3622 | 8.40 


Table 5: BGA with discrete recombination, n=20, 20 runs 


Table 5 shows the simulation results for different selection intensities I and pop- 
ulation sizes N. DF is the distance of the best value found to the global minimum 
of the function. The number of variables is 20. 

GEN denotes the number of generations until convergence, FE the amount of 
computation in function evaluations (FE = N- GEN). Convergence means that 
all individuals have the same value. SD is the standard deviation of GEN. The 
simulation results are in accordance with empirical law 2. The quality of the solution 
depends on N/I. The number of generations needed until convergence is inversely 
proportional to I. Therefore the amount of computation is almost the same in all 
cases. 

The difference between the results for fo and the multimodal function Fa is 
surprisingly small. The number of generations GEN until convergence is the same 
for both problems. For Fo a better quality of approximation is obtained. This 
result shows the advantage of truncation selection, which does not take into account 
smaller differences of the fitness functions. 

Table 6 shows the dependence of the quality of the solution on the size of the 
population. The selection intensity is fixed. The number of generations GEN until 
convergence increases approximately logarithmically with the size of the population 
N. At first this seems to contradict empirical law 1. But a closer look shows that 
the assumptions of the law are not fulfilled in continuous domains. The size of 
the population , necessary to reach the optimum with discrete recombination only, 
has to be infinite. But GEN increases much slower than N, the usual premise in 
quantitative genetics (Falconer, 1981). 

In table 7 simulation results for different numbers of variables are given. The 
results are in accordance to empirical law 2. But also empirical law 1 is approxi- 
mately valid. In the column QUOT the quotient GEN (2n)/GEN (n) is computed. 
Empirical law 1 predicts QUOT to be V2. The computed QUOT in table 7 is less 
than the predicted value. The reason for this behavior is similar to that before. If N 
is held fixed, then the quality of the solution obtained gets worse with higher dimen- 
sions. This means that the population will converge faster than a larger population 
which would give the same quality. 

The next figures show the dynamic behavior of truncation selection. In figure 2 
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STGENTSDT DET FE] 


Table 6: Quality of the solution for Fg, J=1.2, n=20 


IST IIGENT SO DrTavor) 


Table 7: Quality of the solution for Fg for constant N/I 


R(t) is shown for three selection intensities / and three population sizes N. N is 
chosen from table 5, so that approximately the same quality of solution is obtained. 
The R(t) curves behave as expected. A strong selection intensity I in connection 
with a large population gives a much higher R(t) at the beginning. Then it rapidly 
falls to zero. 

In figure 3 the progress of the best solution found is shown. We see that the three 
different runs behave very similarly. For the same number of function evaluations 
all three simulation runs give the same best fitness. This is a stronger result than 
that of table 5, where only the amount of computation at convergence is reported. 

In figure 4 the quotient R(t+1)/S(t) is shown. We see that it oscillates around 1. 
The largest population oscillates the least. The regression coefficient 6 is 1 for all 
runs. This result can be explained in a similar way to the discrete case in section 4. 

Figure 5 shows the fitness distribution of the population at generations 0, 2,4, 6. 
The fitness is symmetric about the average. It resembles a truncated normal distri- 
bution. 
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Figure 3: Fitness of the best individual Figure 4: R(t + 1)/S(t) 


7 Mutation and recombination 


In this section we will compare a BGA with mutation, with recombination and 
with mutation and recombination together. The search strategies of mutation and 
recombination are very different. Mutation is based on chance. It works in small 
populations most efficiently. For unimodal fitness functions the optimal size of the 
population is 1. The progress for a single mutation step is almost unpredictable. 
It needs probability theory and many trials to predict the behavior of this search 
operator. This analysis was done in section 3. 

Recombination is based on restricted chance. The bias is given by the current 
population. Discrete recombination only shuffles the alleles contained in the pop- 
ulation. The alleles of the optimum have to be present in the initial population. 
Otherwise recombination is not able to locate the optimum. The outcome of recom- 
bination is predictable by the selection equation if the population size is large. 

Table 8 compares the BGA with discrete recombination, intermediate recombi- 
nation, with mutation and with both mutation and recombination. 
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Figure 5: Fitness distribution of F6, n=20, N=1024 


Table 8 contains the most important result of this paper. 

A BGA with recombination and mutation outperforms a BGA with a single ge- 
netic operator. Mutation alone is more efficient than recombination alone. Mutation 
and recombination together have a synergetic effect. 

Intermediate recombination behaves similarly to discrete recombination. It is a 
good search strategy as long as the population does not get to similar. 

We will next show that in principle the selection equation can also be used for 
a BGA with mutation and recombination. In a large population recombination will 
be the dominating factor until the individuals get similar. The response to selection 
is predictable. In a small population mutation will dominate. The response to 
selection will be soon unpredictable. This behavior can be observed in figure 6. 
The response to selection R curve is smooth for the large population. It oscillates 
for the small population. The same behavior can be observed for the regression 
coeffient b in figure 7. For the large population bis approximately 1 until generation 
7. This value was predicted for discrete recombination. From generation 20 on the 
coefficient erraticly behaves for both populations. 
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OP 
0.00007 
s Drem] 85.0] 176.0] 

DR 


5.3 2; 


Njo | CENT SD] y FE] 
D 


20 196.60000 


20 50.0 98.70000 
20 564.3 51.2 0.00009 


20 420.0 51.2 0.00009 
341.6 78.7 0.00009 

DR 19.9 ; 20.60000 
252.1 0.00009 

DR&M 89.0 : 0.00009 


Table 8: Recombination(discrete DR, intermediate IR), mutation M for Fe 
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Figure 6: Response to Selection. Figure 7: R(t+1)/S(t) 


8 Numerical results 


The efficiency of the standard BGA will be demonstrated with a set of test func- 
tions. Ån extensive performance evaluation for continuous function optimization has 
been done for the PGA in (Mühlenbein, Schomisch & Born, 1991). It was shown in 
that analysis that varying the problem size n gives valuable information about the 
efficiency of the search method. Efficiency is defined in number of function evalua- 
tions needed to solve the problem. We will show in this section how valuable this 
information is for a comparison. 

We will present results for the test functions shown in table 11. Functions Fe-Fs 
are described in (Mühlenbein, Schomisch & Born, 1991). They are often used as 
test functions in global optimization (Torn & Zilinskas, 1989). Function Fy has been 
proposed by Ackley (1987). It has been subsequently used by Schwefel et al. for 
comparisons of different evolutionary algorithms (submitted for publication). 

Our results are shown in tables 9 and 10. The functions Fe, Fg and Fy have 
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Rastrigin’s Function Fe Schwefel’s Function F7 
Bm Inn) 2920 To] 
20 20 


17495 


20 | 52948 
20 | 112634 117433 
20 | 337570 338480 


200 | 2000 | 248000 309422 
400 | 4000 | 700000 699803 


| 
| 
20 | 25040 134471 


Table 9: Termination criterion € = 107! for Fe, €e = 10~* for Fr. 


Griewangk’s Function Fs | Ackley’s Function Fy | 
ap NT FE [osm Iman N] FE] Tn mn] 
20 66000 40023 30 | 20 | 19420 4733 
100 361722 307625 53860 36380 


200 748300 707855 107800 83713 
400 1630000 1600920 20 | 220820 189330 
20 | 548306 945713 


Table 10: Termination criterion e = 107? for Fg, € = 107? for Fo. 


been solved with a constant population size. These functions have a unimodal 
distribution of the local minima. The BGA mutation scheme is therefore able to 
find the minimum. 

A different behavior is to be expected for Schwefel’s function Fr. Fy does not 
have a unimodal macrostructure. The best minima are far away from each other. 
Furthermore, the global minimum is located at the boundary of the search domain 
(Mühlenbein, Schomisch & Born, 1991). For Fg extended intermediate recombina- 
tion was used. 

In the tables 9 and 10 the termination criterion term is fulfilled if one of the 
objectives | FPCA — Freest| < e. [Fet] or [FBGA — phest| < © if PP = 0 is achieved. 

The search time in function evaluations scales almost exactly with n - In(n) for 
the functions fg and Fg. It scales linearly for function Fy in the range n = 100 till 
n = 1000. Only function F7 gives slightly different results. 

These results can be predicted by the BGA theory. All the test functions have 
a global structure which makes them easy to optimize. The results of the BGA 
are better than those of the PGA. The function evaluations FE cannot be directly 
compared because the termination criterion was different for the PGA. But the PGA 
did scale like n\/n in a much smaller range of n. Thus the performance of the BGA 
gets better compared to the PGA, the larger the problem size n is. This example 
shows the advantage of investigating the scaling of heuristic search methods. 
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Fe(z) =nA+ > (x? — A cos (?rx;)) —5.12 < xz; < 5.12 
1 
F(2) = —z; sin ( fel) —500 < a; < 500 
1 
F(a) = © söt — II cos (4) +1 —600 < x; < 600 
1 1 t 
(x) 


Table 11: Test functions. 


Recent applications of evolutionary strategies to function optimization are re- 
ported in (Back & Hoffmeister, 1991) (Eshelman & Schaffer, 1991), (Schraudolph & 
Belew, 1992) (Voigt, Santibanez-Koref & Born, 1992), (Born, Voigt & Santibanez- 
Koref, 1992). We do not make explicit comparisons here. Instead we hope that the 
authors investigate the scaling of their methods. This will make comparisons very 
simple. 


9 Conclusion 


The BGA is a robust global optimization method based on a solid theory. Selection, 
recombination and mutation are well tuned and have a synergetic effect. The only 
parameter to be input by the user is the size of the population. 

The BGA is inspired by artificial selection as performed by human breeders. But 
mutation and recombination are based on mathematical search techniques. The 
BGA mutation scheme is able to optimize many multimodal functions. The BGA 
solved in this paper some of the most popular test functions in global optimization 
in O(n - In(n)) function evaluations, the same number as for unimodal functions. 
This result demonstrates that these test functions are not as difficult to optimize 
than was formerly believed. 

The standard BGA is no miracle. It is not difficult to construct challenging opti- 
mization problems for it. These problems have deep and narrow valleys of unknown 
directions. Progress is made only in following these valleys. Line recombination is 
a good search operator for such a problem. But are such problems typical applica- 
tions? We did not yet encounter such a problem. 

The BGA has been successfully applied to a number of real world applications. 
The largest application so far was the determination of discriminance functions of 
560 variables for pattern recognition. The BGA solved this problem easily. 

The BGA will be extended in two directions. First, the virtual breeder will 
continuously monitor its population and take appropriate actions. Second, more 
genetic operators will be implemented. This means that our virtual breeder may 
use “biotechnology” to improve the progress of its population. The genetic operators 
will be tested in parallel in different subpopulations. The operators which give good 
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results will grow in the total population. This framework has been implemented in 
our Distributed Breeder Genetic Algorithm. 
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